For numerical data (meaning interval and ratio data), the main issue is how much precision to
record. Recording a numeric value to as many decimals as you have available is usually best. For
example, if a scale can measure body weight to the nearest tenth of a kilogram, record it in the
database to that degree of precision. You can always round off to the nearest kilogram later if you
want, but you can never “unround” a number to recover digits you didn’t record. So it’s best to
record values in your data from measurement instruments to the degree of precision provided.
Along the same lines, don’t group numerical data into intervals when recording it. If you know the age
to the nearest year, don’t record Age in 10-year intervals (such as 20 to 29, 30 to 39, 40 to 49, and so
on). You can always have the computer do that kind of grouping later, but you can never recover the
age in years if all you record is the decade.
Some statistical programs let you store numbers in different formats. The program may refer to these
different storage modes using arcane terms for short, long, or very long integers (whole numbers) or
single-precision (short) or double-precision (long) floating point (fractional) numbers. Each type has
its own limits, which may vary from one program to another or from one kind of computer to another.
For example, a short integer may be able to represent only whole numbers within the range from
, whereas a double-precision floating-point number could easily handle a number
like
. Excel has no trouble storing numerical data in any of these formats, so
to make these choices, it is best to study the statistical program you will use to analyze the data. That
way, you can make rules for storing the data in Excel that make it easy for you to analyze the data once
it is imported into the statistical program.
Following are issues to consider with respect to numerical variables in Excel:
Don’t put two numbers (such as a blood pressure reading of
mmHg) into one column of
data. Excel won’t complain about it, but it will treat it as text because of the embedded “/”, rather
than as numerical data. Instead, create two separate variables and enter each number into the
appropriate variable.
When recording multiple types of measurements (such as days, weeks, months, and years), use two
columns to record the data (such as time and type). In the first column, store the value of the
variable, and in the second column, store a code to indicate the type (such as 1 = days, 2 = weeks,
3 = months, and 4 = years). As an example, “3 weeks” would be entered as a 3 in the time
column and 2 in the type column.
Missing numerical data requires a little more thought than missing categorical data. Some
researchers use 99 (or 999, or 9999) to indicate a missing value in categorical data, but this
approach should not be used for numeric data (because the statistical program will see these
values as actual measured values, and not codes for missing data). The simplest technique for